Aeroacoustic post - processing with MapReduce
نویسنده
چکیده
Present day large-scale computational fluid dynamics simulations can easily produce tens, if not hundreds, of terabytes of useful data. While computational capacity continues to increase according to Moore’s law, the speed of input-output (I/O) to data storage systems has not increased at the same rate. This means that the gap between processing speed and bandwidth to storage systems is increasing exponentially. This trend is in part fueled by the fact that supercomputer power is most often measured in floating point operations per second (FLOPS), while other metrics receive less attention. If the gap between processing and storage speed continues to grow, it will drive scientific data processing towards an in-situ paradigm in which very little data are ever stored. Instead, “post-processing” routines will need to be performed “on-the-fly,” in tandem with the simulations that they analyze. If a different data analysis is desired in the future, it will require re-running the simulation. Note that this paradigm may be at odds with usual scientific procedure where data are collected and then analyzed in a progressive fashion. Understanding one aspect of the data naturally leads to a host of additional questions. In other words, scientific data sets often contain unexpected effects that cannot be predicted ahead of time. The purpose of this brief, therefore, is to explore new technologies enabling fast access to large quantities of stored data as an alternative to the in-situ paradigm. In particular, we look to data-access techniques developed for web search engines like Google, which must constantly query enormous databases pertaining to the state of the Internet. Such databases are too large to be stored on any one disk – instead they reside on thousands or tens of thousands of disks. The solution to fast access lies in expressing a query in a special format known as “MapReduce,” which is in itself a programming paradigm. If a postprocessing task is expressible in this fashion, MapReduce enables the code implementing the task (map phase) to be sent directly to the data residing on distributed disks, rather than requiring the data to be sent to the code. Only a small amount of data representing the desired result is communicated at the end (reduce phase). In this way, effective I/O throughput may be dramatically increased.
منابع مشابه
On the aeroacoustic properties of a beveled plate
The flow around a beveled flat plate model with an asymmetric 25 degrees trailing edge with three rounding radii is analyzed using a Navier-Stokes based open source software package OpenFOAM in order to predict the aeroacoustic properties of the models. A Large Eddy Simulation with a dynamic Smagorinsky and implicit model are used as closure model for the flow solver, and are compared regarding...
متن کاملCloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming
The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...
متن کاملAdaptive Dynamic Data Placement Algorithm for Hadoop in Heterogeneous Environments
Hadoop MapReduce framework is an important distributed processing model for large-scale data intensive applications. The current Hadoop and the existing Hadoop distributed file system’s rack-aware data placement strategy in MapReduce in the homogeneous Hadoop cluster assume that each node in a cluster has the same computing capacity and a same workload is assigned to each node. Default Hadoop d...
متن کاملA New Parallelization Method for K-means
K-means is a popular clustering method used in data mining area. To work with large datasets, researchers propose PKMeans, which is a parallel k-means on MapReduce [3]. However, the existing k-means parallelization methods including PKMeans have many limitations. It can’t finish all its iterations in one MapReduce job, so it has to repeat cascading MapReduce jobs in a loop until convergence. On...
متن کاملBeamforming of aeroacoustic sources in the time domain
A classical array processing technique used for the analysis of aeroacoustic sources is the frequency-domain beamforming technique. The use of this technique requires an assumption on the stationarity of the sources as it works with a time-averaged estimate of the cross-spectral matrix. As a consequence this technique provides an estimation of the average position (in space and time) of an aero...
متن کامل